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ABSTRACT 



A method and system for concealing errors in one or more 
bad frames in a speech sequence as part of an encoded bit 
stream received in a decoder. When the speech sequence is 
voiced, the LTP-parameters in the bad frames are replaced 
by the corresponding parameters in the last frame. When the 
speech sequence is unvoiced, the LTP-parameters in the bad 
frames are replaced by values calculated based on the LTP 
history along with an adaptively-limited random term. 
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METHOD AND SYSTEM FOR SPEECH partially corrupted frame is a frame that does arrive to the 

FRAME ERROR CONCEALMENT IN receiver and can still contain some parameters that are not in 

SPEECH DECODING error. This is usually the situation in a circuit switched 
connection like in the existing GSM connection. The bit- 

FIELD OF THE INVENTION 5 error rate (BER) in the partially corrupted frames is typicaUy 
around 0.5-5%. 

The present invention relates generally to the decoding of From the description above, it can be seen that the two 

speech signals from an encoded bit stream and, more par- cases of bad or corrupted frames will require different 

ticularly, to the concealment of corrupted speech parameters approaches in dealing with the degradation in reconstructed 

when errors in speech frames are detected during speech lo speech due to the loss of speech parameters, 

decoding. The lost or erroneous speech frames are consequences of 

the bad condition of the communication channel, which 

BACKGROUND OF THE INVENTION causes errors to the bit stream. When an error is detected in 

the received speech frame, an error correction procedure is 

Speech and audio coding algorithms have a wide variety 15 started. This error correction procedure usually includes a 

of applications in communication, multimedia and storage substitution procedure and muting procedure. In the prior 

systems. The development of the coding algorithms is driven art, the speech parameters of the bad frame are replaced by 

by the need to save transmission and storage capacity while attenuated or modified values from the previous good frame, 

maintaining the high quality of the synthesized signal. The However, some parameters (such as excitation in CELP 

complexity of the coder is limited by, for example, the 20 parameters) in the corrupted frame may stiU be used for 

processing power of the application platform. In some decoding. 

applications, for example, voice storage, the encoder may be FIG. 2 shows the principle of the prior-art method. As 

highly complex, while the decoder should be as simple as shown in FIG. 2, a buffer labeled "parameter history" is used 

possible. to store the speech parameters of the last good frame. When 

Modem speech codecs operate by processing the speech 25 a bad frame is detected, the Bad Frame Indicator (BFI) is set 

signal in short segments called frames. A typical frame to 1 and the error concealment procedure is started. When 

length of a speech codec is 20 ms, which corresponds to 160 the BFI is not set (BFI=0), the parameter history is updated 

speech samples, assuming an 8 kHz sampling frequency. In and speech parameters are used for decoding without error 

the wide band codecs, the typical frame length of 20 ms concealment. In the prior-art system, the error concealment 

corresponds to 320 speech samples, assuming a 16 kHz 30 procedure uses the parameter history for concealing the lost 

sampling frequency. The frame may be further divided into or erroneous parameters in the corrupted frames. Some 

a number of sub-frames. For every frame, the encoder speech parameters may be used from the received frame 

determines a parametric representation of the input signal. even though it is classified as a bad frame (BF1=1). For 

The parameters are quantized and transmitted through a example, in a GSM Adaptive Multi-Rate (AMR) speech 

communication channel (or stored in a storage medium) in 35 codec (ETSl specification 06.91), the excitation vector from 

a digital form. The decoder produces a synthesized speech the channel is always used. When the speech frames are 

signal based on the received parameters, as shown in FIG. 1. totally lost frames (e.g., in some IP-based transmission 

A typical set of extracted coding parameters includes systems), no parameters will be used from the received bad 
spectral parameters (such as Linear Predictive Coding (LPC) frame. In some cases, no frame will be received, or the frame 
parameters) to be used in short term prediction of the signal, 40 will arrive so late that it has to be cla.ssified as a lost frame, 
parameters to be used for long term prediction (LTP) of the In a prior-art system, LTP-lag concealment uses the last 
signal, various gain parameters, and excitation parameters. good LTP-lag value with a slightly modified fractional part, 
The LTP parameter is closely related to the fundamental and spectral parameters are replaced by the last good param- 
frequency of the speech signal. This parameter is often eters slightly shifted towards constant mean. The gains (LTP 
known as a so-called pitch-lag parameter, which describes 45 and fixed codebook) may usually be replaced by the attenu- 
the fundamental periodicity in terms of speech samples. ated last good value or by the median of several last good 
Also, one of the gain parameters is very much related to the values. The same substituted speech parameters are used for 
fundamental periodicity and so it is called LTP gain. The a!! sub-frames with slight modification to some of them. 
LTP gain is a very important parameter in making the speech The prior-art LTP concealment may be adequate for 
as natural as possible. The description of the coding param- 50 stationary speech signals, for example, voiced or stationary 
eters above fits in general terms with a variety of speech speech. However, for non-stationary .speech signals, the 
codecs, including the so-called Code-Excited Linear Predic- prior-art method may cause unpleasant and audible artifacts, 
tion (CELP) codecs, which have for some time been the For example, when the speech signal is unvoiced or non- 
most successful speech codecs. stationary, simply substituting the lag value in the bad frame 

Speech parameters are transmitted through a communi- 55 with the last good lag value has the efi:ect of generating a 

cation channel in a digital form. Sometimes the condition of short voiced-speech segment in the middle of an unvoiced- 

the communication channel changes, and that might cause speech burst (See FIG. 10). The effect, as known as the 

errors to the bit stream. This will cause frame errors (bad "bing" artifact, can be annoying. 

frames), i.e., some of the parameters describing a particular It is advantageous and desirable to provide a method and 

speech segment (typically 20 ms) are corrupted. There are 60 system for error concealment in speech decoding to improve 

two kinds of frame errors: totally corrupted frames and the speech quality, 
partially corrupted frames. These frames are sometimes not 

received in the decoder at all. In the packet-based fransmis- SUMMARY OF THE INVENTION 
sion systems, like in normal internet connections, the situ- 
ation can arise when the data packet will never reach the 65 The present invention takes advantage of the fact that 
receiver, or the data packet arrives so late that it cannot be there is a recognizable relationship among the long-term 
used because of the real time nature of spoken speech. The prediction (LTP) parameters in the speech signals. In par- 
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ticular, the LTP-lag has a strong coaelation with the LTP- 
gain. When the LTP-gain is high and reasonably stable, the 
LTP-lag is typically very stable and the variation between 
adjacent lag values is small. In that case, the speech param- 
eters are indicative of a voiced-speech sequence. When the 5 
LTP-gain is low or unstable, the LTP-lag is typically 
unvoiced, and the speech parameters are indicative of an 
unvoiced-speech sequence. Once the speech sequence is 
classified as stationary (voiced) or non-stationary (un- 
voiced), the corrupted or bad frame in the sequence can be lo 
processed differently. 

Accordingly, the first aspect of the present invention is a 
method for concealing errors in an encoded bit stream 
indicative of speech signals received in a speech decoder, 
wherein the encoded bit stream includes a plurality of 15 
speech frames arranged in speech sequences, and the speech 
frames include at least one corrupted frame preceded by one 
or more non-corrupted frames, wherein the corrupted frame 
includes a first long-term prediction lag value and a first 
long-term prediction gain value, and the non-corrupted 20 
frames include second long-term prediction lag values and 
second long-term prediction gain values, and wherein the 
second long-term prediction lag values include a last long- 
term prediction lag value, and the second long-term predic- 
tion gain values include a last long-term prediction gain 25 
value, and the speech sequences include stationary and 
non-stationary speech sequences, and wherein the corrupted 
frame can be partially corrupted or totally corrupted. The 
method comprises the steps of: 

determining whether the iirst long-term prediction lag 30 
value is within or outside an upper limit and a lower 
limit determined based on the second long-term pre- 
diction lag values; 
replacing the iirst long-term prediction lag value in the 
partially corrupted frame with a third lag value, when 35 
the first long-term prediction lag value is outside the 
upper and lower limits; and 
retaining the first long-term prediction lag value in the 
partially corrupted frame when the first long-term pre- 
diction lag value is within the upper and lower limits. 4d 
Alternatively, the method comprises the steps of: 
determining whether the speech sequence in which the 
corrupted frame is arranged is stationary or non-sta- 
tionary, based on the second long-term prediction gain 
values; 45 
when the speech sequence is stationary, replacing the first 
long-term prediction lag value in the corrupted frame 
with the last long-term prediction lag value; and 
when the speech sequence is non-stationary, replacing the 
first long-term prediction lag value in the corrupted 50 
frame with a third long-term prediction lag value 
determined based on the second long-term prediction 
lag values and an adaptively-limited random lag jitter, 
and replacing the first long-term prediction gain value 
in the corrupted frame with a third long-term prediction 55 
gain value determined based on the second long-term 
prediction gain values and an adaptively-limited ran- 
dom gain jitter. 
Preferably, the third long-term prediction lag value is 
calculated based at least partially on a weighted median of 60 
the second long-term prediction lag values, and the adap- 
tively-Hmited random lag jitter is a value bound by limits 
determined based on the second long-term prediction lag 
values. 

Preferably, the third long-term prediction gain value is 65 
calculated based at least partially on a weighted median of 
the second long-term prediction gain values, and the adap- 



tively-limited random gain jitter is a value bound by limits 
determined based on the second long-term prediction gain 
values. 

Alternatively, the method comprises the steps of: 
determining whether the corrupted frame is partially 

corrupted or totally corrupted; 
replacing the first long-term prediction lag value in the 
corrupted frame with a third lag value if the corrupted 
frame is totally corrupted, wherein when the speech 
sequence in which the totally corrupted frame is 
arranged is stationary, set the third lag value equal to 
the last long-term prediction lag value, and when said 
speech sequence is non-stationary, determining the 
third lag value based on the second long-term predic- 
tion values and an adaptively-limited random lag jitter; 

replacing the first long-term prediction lag value in the 
corrupted frame with a fourth lag value if the corrupted 
frame is partially corrupted., wherein when the speech 
sequence in which the partially corrupted frame is 
arranged in stationary, set the fourth lag value equal to 
the last long-term prediction lag value, and when said 
speech sequence is non-stationary set the fourth lag 
value based on a decoded long-term prediction lag 
value searched from an adaptive codebook associated 
with the non-corrupted frame preceding the corrupted 
frame, when said speech sequence is non-stationary. 
The second aspect of the present invention is a speech 
signal transmitter and receiver system for encoding speech 
signals in an encoded bit stream and decoding the encoded 
bit stream into synthesized speech, wherein the encoded bit 
stream includes a plurahty of speech frames arranged in 
speech sequences, and the speech frames include at least one 
corrupted frame preceded by one or more non-corrupted 
frames, wherein the corrupted frame is indicated by a first 
signal and includes a first long-term prediction lag value and 
a first long-term prediction gain value, and the non-cor- 
rupted frames include second long-term prediction lag val- 
ues and second long-term prediction gain values, and 
wherein the second long-term prediction lag values include 
a last long-term prediction lag value, and the second long- 
term prediction gain values include a last long-term predic- 
tion gain value, and the speech sequences include stationary 
and non-stationary speech sequences. The system com- 
prises: 

a first mechanism, responsive to the first signal, for 
determining whether the speech sequence in which the 
corrupted frame is arranged is stationary or non-sta- 
tionary, based on the second long-term prediction gain 
values, and for providing a second signal indicative of 
whether the speech sequence is stationary or non- 
stationary; and 

a second mechanism, responsive to the second signal, for 
replacing the first long-term prediction lag value in the 
corrupted frame with the last long-term prediction lag 
value when the speech sequence is stationary, and 
replacing the first long-term prediction lag value and 
the first long-term gain value in the corrupted frame 
with a third long-term prediction lag value and a third 
long-term prediction gain value, respectively, when the 
speech sequence is non-stationary, wherein the third 
long-term prediction lag value is determined based on 
the second long-term prediction lag values and an 
adaptively-limited random lag jitter, and the third long- 
term prediction gain value is determined based on the 
second long-term prediction gain values and an adap- 
tively-limited random gain jitter. 
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Preferably, the third long-term prediction lag value is 
calculated based at least partially on a weighted median of 
the second long-term prediction lag values, and the adap- 
tively-limited random lag jitter is a value bound by limits 
determined based on the second long-term prediction lag 5 
values. 

Preferably, the third long-term prediction gain value is 
calculated based at least partially on a weighted median of 
the second long-term prediction gain values, and the adap- 
tivcly-limitcd random gain jitter is a value bound by limits lo 
determined based on the second long-term prediction gain 
values. 

The third aspect of the present invention is a decoder for 
synthesizing speech from an encoded bit stream, wherein the 
encoded bit stream includes a plurality of speech frames 15 
arranged in speech sequences, and the speech frames include 
at least one corrupted frame preceded by one or more 
non-corrupted frames, wherein the corrupted frame is indi- 
cated by a first signal and includes a first long-term predic- 
tion lag value and a first long-term prediction gain value, and 20 
the non-corrupted frames include second long-term predic- 
tion lag values and second long-term prediction gain values, 
and wherein the second long-term prediction lag values 
include a last long-term prediction lag value and the second 
long-term prediction gain values include a last long-term 25 
prediction gain value and the speech sequences include 
stationary and non-stationary speech sequences. The 
decoder comprises: 

a first mechanism, responsive to the first signal, for 
determining whether the speech sequence in which the 30 
corrupted frame is arranged is stationary or non-sta- 
tionary, based on the second long-term prediction gain 
values, and for providing a second signal indicative of 
whether the speech sequence is stationary or non- 
stationary; and 35 
a second mechanism, responsive to the second signal, for 
replacing the first long-term prediction lag value in the 
corrupted frame with the last long-term prediction lag 
value when the speech sequence is stationary, and 
replacing the first long-term prediction lag value and 40 
the first long-term gain value in the corrupted frame 
with a third long-term prediction lag value and a third 
long-term prediction gain value, respectively, when the 
speech sequence is non-stationary, wherein the third 
long-term prediction lag value is determined based on 45 
the second long-term prediction lag values and an 
adaptively-limited random lag jitter, and the third long- 
term prediction gain value is determined based on the 
second long-term prediction gain values and an adap- 
tively-limited random gain jitter. 50 
The fourth aspect of the present invention is a mobile 
station, which is arranged to receive an encoded bit stream 
containing speech data indicative of speech signals, wherein 
the encoded bit stream includes a plurality of speech frames 
arranged in speech sequences, and the speech frames include 55 
at least one corrupted frame preceded by one or more 
non-corrupted frames, wherein the corrupted frame is indi- 
cated by a first signal and includes a first long-term predic- 
tion lag value and a first long-term prediction gain value, and 
the non-corrupted frames include second long-term predic- 60 
tion lag values and second long-term prediction gain values, 
and wherein the second long-term prediction lag values 
include a last long-term prediction lag value and the second 
long-term prediction gain values include a last long-term 
prediction gain value and the speech sequences include 65 
stationary and non-stationary speech sequences. The mobile 
station comprises: 



a first mechanism, responsive to the first signal, for 
determining whether the speech sequence in which the 
corrupted frame is arranged is stationary or non-sta- 
tionary, based on the second long-term prediction gain 
values, and for providing a second signal indicative of 
whether the speech sequence is stationary or non- 
stationary; and 
a second mechanism, responsive to the second signal, for 
replacing the first long-term prediction lag value in the 
corrupted frame with the last long-term prediction lag 
value when the speech sequence is stationary, and 
replacing the first long-term prediction lag value and 
the first long-term gain value in the corrupted frame 
with a third long-term prediction lag value and a third 
long-term prediction gain value, respectively, when the 
speech sequence is non-stationary, wherein the third 
long-term prediction lag value is determined based on 
the second long-term prediction lag values and an 
adaptively-limited random lag jitter, and the third long- 
term prediction gain value is determined based on the 
second long-term prediction gain values and an adap- 
tively-limited random gain jitter. 
The fifth aspect of the present invention is an element in 
a telecommunication network, which is arranged to receive 
an encoded bit stream containing speech data from a mobile 
station, wherein the speech data includes a plurality of 
speech firaines arranged in speech sequences, and the speech 
frames include at least one corrupted frame preceded by one 
or more non-corrupted frames, wherein the corrupted frame 
is indicated by a first signal and includes a first long-term 
prediction lag value and a first long-term prediction gain 
value, and the non-corrupted frames include second long- 
term prediction lag values and second long-term prediction 
gain values, and wherein the second long-term prediction 
lag values include a last long-term prediction lag value and 
the second long-term prediction gain values include a last 
long-term prediction gain value and the speech sequences 
include stationary and non-stationary speech sequences. The 
element comprises: 

a first mechanism, responsive to the first signal, for 
determining whether the speech sequence in which the 
corrupted frame is arranged is stationary or non-sta- 
tionary, based on the second long-term prediction gain 
values, and for providing a second signal indicative of 
whether the speech sequence is stationary or non- 
stationary; and 
a second mechanism, responsive to the second signal, for 
replacing the first long-term prediction lag value in the 
corrupted frame with the last long-term prediction lag 
value when the speech sequence is stationary, and 
replacing the first long-term prediction lag value and 
the first long-term gain value in the corrupted frame 
with a third long-term prediction lag value and a third 
long-term prediction gain value, respectively, when the 
speech sequence is non-stationary, wherein the third 
long-term prediction lag value is determined based on 
the second long-term prediction lag values and an 
adaptively-limited random lag jitter, and the third long- 
term prediction gain value is determined based on the 
second long-term prediction gain values and an adap- 
tively-limited random gain jitter. 
The present invention will become apparent upon reading 
the description taken in conjunction with FIGS. 3 to He. 
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BRIEF DESCRIPTION OF THE DRAWINGS ment module 60 to the decoding module 20. The speech 
parameters 102 typically include LPC parameters for short 

FIG. 1 is a block diagram illustrating a generic distributed term prediction, excitation parameters, a long-term predic- 

speech codec, wherein the encoded bit stream containing ijon ^TP) lag parameter, an LTP gain parameter and other 

speech data is conveyed from an encoder to a decoder via a 5 gain parameters. The parameter history storage 50 is used to 

communication channel or a storage medium. store the LTP-lag and LTP-gain of a number of non-cor- 

FIG 2 IS a block diagram illustratmg a pnor-art error fed speech frames. The contents of the parameter history 

^°^^^^^^l<^lP^r3l^^,^n!,r^cw^r. storage 50 are constantly updated so that the last LTP-gain 

¥IG. 3 IS a block diagram illustratmg the error conceal- ^^^^^^^^ j^^^ ^TP-lag parameter stored in the 

ment "pL^ la'u m a receiver, according to the present lo , e.. , , . , . r 

invention storage 50 are those of the last non-corrupted speech frame. 

no. 4' is a flow chart illustrating the method of error Y*"!? ^ '^^''^ '^'^''^"''^ '^'^'^^'^ ^ 
concealment according to the present invention. '° 1 speech 
FIG. 5 is a diagrammatic representation of a mobile parameters 102 of the corrupted frame are conveyed to the 
station, which includes an error concealment module, 15 analyzer 70 through the switch 40. By comparing the 
according to the present invention. LTP-gain parameter in the corrupted frame and the LTP-gain 
FIG. 6 is a diagrammatic representation of a telecommu- parameters stored in the storage 50, it is possible for the is 
nication network using a decoder, according to the present ^^^^y^^^ '° determine whether the speech sequence is 
invention. stationary or non-stationary, based on the magnitude and its 
FIG. 7 is a plot of LTP-parameters illustrating the lag and 20 variation in the LTP-gain parameters in neighboring firames. 
gain profiles in a voiced speech sequence Typically, in a stationary sequence, the LTP-gain parameters 
FIG. 8 is a plot of LTP-parameters illustrating the lag and ^^^^ ^"'^ reasonably stable, the LTP-lag value is stable 
gain profiles in an unvoiced speech sequence. '^e vanation m adjacent LTP-lag values is small, as 
FIG. 9 is a plot of LTP-lag values in a series of sub-frames ^" contrast, m an non-stationary sequence, 
illustrating the difference between the prior-art error con- 25 '^^ LTP-gain parameters are low and unstable, and the 
cealment approach and the approach according to the ^TP-lag is also unstable, as shown in FIG. 8. The LTP-lag 
present invention. values are changing more or less randomly. FIG. 7 shows the 
FIG. 10 is another plot of LTP-lag values in a series of ^"^^""^ sequence for the word "viinia". FIG. 8 shows the 
sub-frames illustrating the difference between the prior-art ^P^^'^'' sequence for the word "exhibition", 
error concealment approach and the approach according to 30 If the speech sequence that includes the corrupted frame 
the present invention. is voiced or stationary, the last good LTP-lag is retrieved 
FIG. 11a is a plot of speech signals illustrating an error- from the storage 50 and conveyed to the parameter conceal- 
free speech sequence having the location of the bad frame of ment module 60. The retrieved good LTP-lag is used to 
the speech channel, as shown in FIGS, life and 11c. replace the LTP-lag of the corrupted frame. Because the 
FIG. llii is a plot of speech signals illustrating the 35 LTP-lag in a stationary speech sequence is stable and its 
concealment of parameters in a bad frame according to the variation is small, it is reasonable to use a previous LTP-lag 
prior art approach. with small modification to conceal the corresponding param- 
FIG. 11c is a plot of speech signals illustrating the eter in corrupted frame. Subsequently, an RX signal 104 
concealment of parameters in a bad frame according to the causes the replacement parameters, as denoted by reference 
present invention. 40 numeral 134, to be conveyed to the decoding module 20 

through the switch 42. 

If the speech sequence that includes the corrupted frame 
is unvoiced or non-stationary, the analyzer 70 calculates a 
replacement LTP-lag value and a replacement LTP-gain 
FIG. 3 illustrates a decoder 10, which includes a decoding 45 value for parameter concealment. Because LTP-lag in an 
module 20 and an error concealment module 30. The decod- non-stationary speech sequence is unstable and its variation 
ing module 20 receives a signal 140, which is normally in adjacent frames is typically very large, parameter con- 
indicative of speech parameters 102 for speech synthesis. cealment should allow the LTP-lag in an error-concealed 
The decoding module 20 is known in the art. The error non-stationary sequence to fluctuate in a random fashion. If 
concealment module 30 is arranged to receive an encoded 50 the parameters in the corrupted frame are totally corrupted, 
bit stream 100, which includes a plurality of speech streams such as in a lost frame, the replacement LTP-lag is calculated 
arranged in speech sequences. A bad-frame detection device by using a weighted median of the previous good LTP-lag 
32 is used to detect corrupted frames in the speech sequences values along with an adaptively-limited random jitter. The 
and provide a Bad-Frame-Indicator (BFl) signal 110 repre- adaptively-limited random jitter is allowed to vary within 
senting a BFI flag when a corrupted frame is detected. BFI 55 limits calculated from the history of the LTP values, so that 
is also known in the art. The BFI signal 110 is used to control the parameter fluctuation in an error-concealed segment is 
two switches 40 and 42. Normally, the speech frames are not similar to the previous good section of the same speech 
corrupted and the BFI flag is 0. The terminal S is operatively sequence. 

connected to the terminal 0 in the switches 40 and 42. The An exemplary rule for LTP-lag concealment is governed 

speech parameters 102 are conveyed to a buffer, or "param- 60 by a set of conditions as follows: 
eter history" storage, 50 and the decoding module 20 for 

speech synthesis. When a bad frame is detected by the If 

bad-frame detection device 32, the BFI flag is set to 1. The minGain>0.5 AND LagDif<10; OR 

terminal S is connected to the terminal 1 in the switches 40 lastGain>0.5 AND secondLastGain>0.5, 
and 42. Accordingly, the speech parameters 102 are pro- 65 

vided to an analyzer 70, and the speech parameters needed then the last received good LTP-lag is used for the totally 

for speech synthesis are provided by a parameter conceal- corrupted frame. Otherwise, Update_lag, a weighted aver- 
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age of the LTP-lag buffer with randomization, is used for the Typically, in the channel decoding process, the BER per 

totally corrupted frame. Update_lag is calculated in a man- frame is a good indicator for the channel condition. When 

ner as described below: the channel condition is good, the BER per frame is small 

The LTP-lag buffer is sorted and the three biggest buffer and a high percentage of the LTP-lag values in the erroneous 

values are retrieved. The average of these three biggest 5 frames are correct. For example, when the frame error rate 

values is referred to as the weighted average lag (WAL), and (pgR) is 0.2%, over 70% of the LTP-lag values are correct, 

the difference from these biggest values is referred to as the ^ven when the PER reaches 3%, about 60% of the LTP-lag 

"^^'^ nit^'^^^'T^^^'^^-. • -u u , . values are still correct. The CRC can accurately detect a bad 

I^t RAND be the randomization with the scale of ^^^^^ ^ .^^ordingly. However, the CRC 



(-WLD/2, V/LD/2), then Updatc_lag=WAL+RANC lo " . -7 . " . T.T^y^T^ 1 7 7: 

(-WLD/'' WLDP) f — ' 6 does not provide an estimation ot the BER in the frame. If 

the BFI flag is used as the only criterion for parameter 

wherein concealment, then a high percentage of the correct LTP-lag 

minGain is the smallest value of the LTP-gain buffer; values could be wasted. In order to prevent a large amount 

LagDif is the difference between the smallest and the of correct LTP-lags from being thrown away, it is possible 

largest LTP-lag values; to adapt a decision criterion for parameter concealment 

lastGain is the last received good LTP-gain; and based on the LTP history. It is also possible to use the PER, 

secondLastOam is the second last received good LTP- for example, as the decision criterion. If the LTP-lag meets 

rr ^f'°' • ■ , r . „ the decision criterion, no parameter concealment is neces- 

If the parameters in the corrupted frame are partially ^^^^ ^ ^ 

corrupted, then the LTP-lag value m the corrupted frame is io2, as received through the switch 40, to the 

replaced accordingly. That the firame IS partially corrupted IS , > , . , ^« . • . 

Jit r ~i rni*,. • • parameter conceahnent module 60 which then conveys the 

determined by a set of exemplary LTP-feature criteria given ^ , , ,. , , ■ , ^-^ ,r , 

bg|o^. same to the decoding module 20 through the switch 42. If the 

LTP-lag does not meet that decision criterion, then the 

If 25 corrupted frame is further examined using the LTP-feature 

(1) LagDif<10 AND (minLag-5)<T6^(maxLag+5); OR criteria, as described hereinabove, for parameter conceal- 

(2) lastGain>0.5 AND secondLastGain>0.5 AND (last- tntut. 

Lag-10)<T^^(lastLag+10); OR Instationary speech sequences, the LTP-lag is very stable. 

(3) minGain<0.4 AND lastGam=minGain AND ^j^^.j^^^ ^JJ^^^^ ^TP lig values in a corrupted frame are 
..w ... X '° correct or erroneous can be correctiv predicted with high 
4 LagDif <70 AND mmLag<l^^maxLag; OR probability. Thus, it is possible to adapt a very strict criterion 
(5) meanLag<Ts^<maxLag fo,. parameter concealment. In non-stationary speech 

is true, then T^^. is used to replace the LTP-lag in the sequences, it may be difficult to predict whether the LTP-lag 

corrupted frame. Otherwise, the corrupted frame is treated as 35 value in a corrupted frame is correct, because of the unstable 

a totally corrupted frame, as described above. In the above "a'ure of the LTP parameters. However, that the prediction 

conditions: correct or wrong is less important in non-stationary speech 

maxLag is the largest value of the LTP-lag buffer; 'han in stationary speech. While allowing erroneous LTP-lag 

meanLag is the average of the LTP-lag buffer; ^'''1"'='^ "s*^"^ decoding stationary speech may cause 

minLag is the smallest value of the LTP-lag buffer; 40 *e synthesized .speech to be unrecognizable, allowing erro- 

lastLag is the last received good LTP-lag value; and neous LTP-lag va ues to be used in decoding non-stationary 

T,,is a decoded LTP lag which is .searched, when the BFI usually only increases the audible artifacts. TTius, the 

is set, from the adaptive codebook as if the BFI is not "i"'^™" for parameter concealment in non-station- 
ary speech can be relatively lax. 

Two examples of parameter concealment are shown in 4s mentioned earlier, the LTP-gain fluctuates greatly in 

FIGS. 9 and 10. As shown, the profile of the replacement non-stationary speech. If the same LTP-gain value from the 

LTP-lag values in the bad frame, according to the prior art, ^^^t good frame is used repeatedly to replace the LTP-gain 

is rather flat, but the profile of the replacement, according to ^'''^^^ °ne °' corrupted frames in a speech sequence, 

the present invention, aUows some fluctuation, similar to the 'he LTP-gam profile in the gain concealed segment will be 

error-free profile. The difference between the prior art ,o A'"" (-^""il^r to the prior-art LTP-lag replacement, as shown 

approach and the present invention is further illustrated in ^1^^. 7 and 8), in stark contrast to the fluctuating profile 



FIGS, lib and 11c, respectively, based on the speech signals °^ '^'^ non-corrupted frames. The sudden change 

in an error-free channel, as shown in FIG. 11a. LTP-gain profile may cause unpleasant audible artifacts. In 

When the parameters in the corrupted frame are partially o""^'^^ '° minimize these audible artifacts, il is po,ssible to 

corrupted, the parameter concealment can be further opti- 55 '^^^ replacement LTP-gain value to fluctuate in the 

mized. In partiaUy corrupted frames, the LTP-lags in the error-concealed segment. For this purpose, the analyzer 70 

corrupted frames may still yield an acceptable synthesized c''" '"•'s" "^^^ '° determine the hmits between which the 

speech segment. Accordingly to the GSM specifications, the replacement LTP-gain value is allowed to fluctuate based on 

BFI flag is set by a Cyclic Redundancy Check (CRC) *e gam values in the LTP history. 

mechanism or other error detection mechanisms. These error 60 LTP-gain concealment can be carried out in a manner as 

detection mechanisms detect errors in the most significant described below. When the BFI is set, a replacement LTP- 

bits in the channel decoding process. Accordingly, even gain value is calculated according to a set of LTP-gain 

when only a few bits are erroneous, the error can be detected concealment rules. The replacement LTP-gain is denoted as 

and the BFI flag is set accordingly. In the prior-art parameter Updated_gain. 

concealment approach, the entire frame is discaided. As a 65 (1) If gainDff>0.5 AND lastGain=maxGain>0.9 AND 

result, information contained in the correct bits is thrown subBF=l, then Updated_5ain= 

away. (secondLastGain+thirdLastGain)/2; 
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(2) If gainDif>0.5 AND lastGain=maxGain>0.9 AND lated and ampKfied by the transmit block, is taken via the 
subBF=2, then Updated_gain=meanGain+randvar* transmit/receive switch 208 to the antenna 209. The signal to 
(maxGain-meanGain); be received is taken from the antenna via the transmit/ 

(3) If gainDif>0.5 AND IastGain=maxGain>0.9 AND receive switch 208 to the receiver block 211, which demodu- 
subBF=3, then Updated__gain=meanGain-randVar* 5 lates the received signal and decodes the deciphering and the 
(meanGain-minOain); channel coding. The resulting speech signal is taken via the 

(4) If gainDif>0.5 AND lastGain=maxGain>0.9 AND D/A converter 212 to an amplifier 213 and further to an 
subBF=4, then Updated„^ain=meanGain+randVar* earphone 214. The control unit 205 controls the operation of 
(maxGain-meanGain); the mobile station 200, reads the control commands given by 

In the previous conditions, Updated_gain cannot be larger '° "^'i' 'f ^ ^07 and gives messages to the user 

than lastGain. If the previous conditions cannot be met, the by means of the display 206. 

following conditions are used: parameter concealment module 30, accordmg to the 

(5) If gainDif>0.5, then Updated^ain=lastGain; ""^^^ ' telecommunication network 

(6) If gainDif<0.5 AND lastGain=maxGain, then ^ such as an ordinary telephone network or a mobile 
Updated_gain=meanGain- station network, such as the GSM network. FIG. 6 shows an 

(7) If gainDIF<0.5, then Updated_gain=lastGain, cxamph of a block diagram of such a telecommunication 

network. For example, the telecommumcation network 300 
Wherein can comprise telephone exchanges or corresponding switch- 
meanGain is the average of the LTP-gain buffer; ing systems 360, to which ordinary telephones 370, base 
maxGain is the largest value of the LTP-gain buffer; 20 stations 340, base station controllers 350 and other central 
minGain is the smallest value of the LTP-gain buffer; devices 355 of telecommunication networks are coupled. 
randVar is a random value between 0 and 1, Mobile stations 330 can estabHsh connection to the tele- 
gainDIF is the difference between the smallest and the communication network via the base stations 340. A decod- 
largest LTP-gain values in the LTP-gain buffer; ing block 320, which includes an error concealment module 
lastGain is the last received good LTP-gain; 25 322 similar to the error concealment module 30 shown in 
seconLastGain is the second last received good LTP-gain; FIG. 3, can be particularly advantageously placed in the base 
thirdLastGain is the third last received good LTP-gain; station 340, for example. However, the decoding block 320 
and can also be placed in the base station controller 350 or other 
subBF is the order of the subframe. central or switching device 355, for example. If the mobile 
FIG. 4 illustrates the method of error-concealment, 30 station system uses separate transcoders, for example, 
according to the present invention. As the encoded bit stream between the base stations and the base station controllers, for 
is received at step 160, the frame is checked to see if it is transforming the coded signal taken over the radio channel 
corrupted at step 162. If the frame is not corrupted, then the into a typical 64 kbit/s signal transferred in a telecommu- 
parameter history of the speech sequence is updated at step nication system and vice versa, the decoding block 320 can 
164, and the speech parameters of the current frame are 35 also be placed in such a transcoder. In general, the decoding 
decoded at step 166. The procedure then goes back to step block 320, including the parameter concealment module 
162. If the frame is bad or corrupted, the parameters are 322, can be placed in any element of the telecommunication 
retrieved from the parameter history storage at step 170. network 300, which transforms the coded data stream into an 
Whether the corrupted frame is part of the stationary speech uncoded data stream. The decoding block 320 decodes and 
sequence or non-stationary speech .sequence is determined at 40 filters the coded speech signal coming from the mobile 
step 172. If the speech sequence is stationary, the LTP-lag of station 330, whereafter the speech signal can be transferred 
the last good frame is used to replace the LTP-lag in the in Ihe usual manner as uncompressed forward in the tele- 
corrupted frame at step 174. If the speech sequence is communication network 300. 

non-stationary, a new lag value and new gain value are It should be noted that the error concealment method of 

calculated based on the LTP history at step 180, and they are 45 the present invention has been described with respect to 

used to replace the corresponding parameters in the cor- stationary and non-stationary speech sequences, and that 

rupted frame at step 182. stationary speech sequences are usually voiced and non- 

FIG. 5 shows a block diagram of a mobile station 200 stationary speech sequences are usually unvoiced. Thus, it 

according to one exemplary embodiment of the invention. will be understood that the disclosed method is applicable to 

The mobile station comprises parts typical of the device, 50 error concealment in voiced and unvoiced speech sequences, 

such as a microphone 201, keypad 207, display 206, ear- The present invention is applicable to CELP type speech 

phone 214, transmit/receive switch 208, antenna 209 and codecs and can be adapted to other types of speech codecs 

control unit 205. In addition, the figure shows transmitter as well. Thus, although the invention has been described 

and receiver blocks 204, 211 typical of a mobile station. The with respect to a preferred embodiment thereof, it will be 

transmitter block 204 comprises a coder 221 for coding the 55 understood by those skilled in the art that the foregoing and 

speech signal. The transmitter block 204 also comprises various other changes, omissions and deviations in the form 

operations required for channel coding, deciphering and and detail thereof may be made without departing from the 

modulation as well as RF functions, which have not been spirit and scope of this invention, 

drawn in FIG. 5 for clarity. The receiver block 211 also What is claimed is: 

comprises a decoding block 220 according to the invention. 60 1. A method for concealing errors in an encoded bit stream 

Decoding block 220 comprises an error concealment module indicative of speech signals received in a speech decoder, 

222 like the parameter concealment module 30 shown in wherein the encoded bit stream includes a plurality of 

FIG. 3. The signal coming from the microphone 201, speech frames arranged in speech sequences, and the speech 

amplified at the amplification stage 202 and digitized in the frames include at least one partially corrupted frame pre- 

A/D converter, is taken to the transmitter block 204, typi- 65 ceded by one or more non-corrupted frames, wherein the 

cally to the speech coding device comprised by the transmit partially corrupted frame includes a first long-term predic- 

block. The transmission signal, which is processed, modu- tion lag value and a first long-term prediction gain value, and 
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the non-corrupted frames include second long-term predic- 
tion lag values and second long-term prediction gain values, 
said method comprising the steps of: 

providing an upper limit and a lower limit based on the 

second long-term prediction lag values; 5 
determining whether the first long-term prediction lag 
value is within or outside the upper and lower limits; 
replacing the first long-term prediction lag value in the 
partially corrupted frame with a third lag value, when 
the first long-term prediction lag value is outside the 10 
upper and lower limits; and 
retaining the first long-term prediction lag value in the 
partially corrupted frame when the first long-term pre- 
diction lag value is within the upper and lower limits. 

2. The method of claim 1, further comprising the step of 15 
replacing the first long-term prediction gain value in the 
partially corrupted frame with a third gain value, when the 
first long-term lag value is outside the upper and lower 
limits. 

3. The method of claim 1, wherein the third lag value is 20 
calculated based the second long-term prediction lag values 
and an adaptively-limited random lag jitter bound by further 
limits determined based on the second long-term prediction 
lag values. 

4. The method of claim 2, wherein the third gain value is 25 
calculated based on of the second long-term prediction gain 
values and an adaptively-limited random gain jitter bound 
by limits determined based on the second long-term predic- 
tion gain values. 

5. A speech signal transmitter and receiver system for 30 
encoding speech signals in an encoded bit stream and 
decoding the encoded bit stream into synthesized speech, 
wherein the encoded bit stream includes a plurality of 
speech frames arranged in speech sequences, and the speech 
frames include at " 

ceded by one or more non-corrupted frames, wherein the 
partially corrupted frame includes a first long-term predic- 
tion lag value and a first long-term prediction gain value, and 
the non-corrupted frames include second long-term predic- 
tion lag values and second long-term prediction gain values, 40 
and a first signal is used to indicate the partially corrupted 
frame, said system comprising: 

a first means, responsive to the first signal, for determin- 
ing whether the first long term prediction lag is within 
an upper limit and a lower limit, and for providing a 45 
second signal indicative of said determining; 
a second means, responsive to the second signal, for 
replacing the first long-term prediction lag value in the 
partially corrupted frame with a third lag value when 
the first long-term prediction lag value is outside the 50 
upper and lower limits; and retaining the first long-term 
prediction lag value in the partially corrupted frame 
when the first long-term prediction lag value is vwthin 
the upper and lower limits. 

6. The system of claim 5, wherein the third lag value is 55 
determined based on the second long-term prediction lag 
values and an adaptively-limited random lag jitter. 

7. The system of claim 5, wherein the second means 
further replaces the first long-term prediction gain value in 
the partially corrupted frame with a third gain value when 60 
when the first long-term prediction lag value is outside the 
upper and lower limits. 

8. The system of claim 7, wherein the third gain value is 
determined based on the second long-term prediction gain 
values and an adaptively-limited random gain jitter. 

9. A decoder for synthesizing speech from an encoded bit 
stream, wherein the encoded bit stream includes a plurality 



of speech frames arranged in speech sequences, and the 
speech frames include at least one partially corrupted frame 
preceded by one or more non-corrupted frames, wherein the 
partially corrupted frame includes a first long-term predic- 
tion lag value and a first long-term prediction gain value, and 
the non-corrupted frames include second long-term predic- 
tion lag values and second long-term prediction gain values, 
and a first signal is used to indicate the partially corrupted 
frame, said decoder comprising: 

a first means, responsive to the first signal, for determin- 
ing whether the first long-term prediction lag is within 
an upper limit and a lower limit, and for providing a 
second signal indicative of said determining; 
a second means, responsive to the second signal, for 
replacing the first long-term prediction lag value in the 
partially corrupted frame with a third lag value when 
the first long-term prediction lag value is outside the 
upper and lower limits; and retaining the first long-term 
prediction lag value in the partially corrupted frame 
when the first long-term prediction lag value is within 
the upper and lower limits. 

10. The decoder of claim 9, wherein the third lag value is 
determined based on the second long-term prediction lag 
values and an adaptively-limited random lag jitter. 

11. The decoder of claim 9, wherein the second means 
further replaces the first long-term gain value in the partially 
corrupted frame with a third gain value when the first 
long-term prediction lag value is outside the upper and lower 
limits. 

12. The decoder of claim 11, wherein the third gain value 
is determined based on the second long-term prediction gain 
values and an adaptively-limited random gain jitter. 

13. A mobile station, which is arranged to receive an 
partially^corrupted frame pre- 35 encoded bit stream containing speech data indicative of 

f.„ „„ ...1 .u. speech signals, wherein the encoded bit stream includes a 
plurality of speech frames arranged in speech sequences, and 
the speech frames include at least one partially corrupted 
frame preceded by one or more non-corrupted frames, 
wherein the partially corrupted frame includes a first long- 
term prediction lag value and a first long-term prediction 
gain value, and the non-corrupted frames include second 
long-term prediction lag values and second long-term pre- 
diction gain values, and wherein a first signal is used to 
indicate the corrupted frame, said mobile station compris- 
ing: 

a first means, responsive to the first signal, for determin- 
ing whether the first long-term prediction lag is within 
an upper limit and a lower limit, and for providing a 
second signal indicative of said determining; 
a second means, responsive to the second signal, for 
replacing the first long-term prediction lag value in the 
partially corrupted frame with a third lag value when 
the first long-term prediction lag value is outside the 
upper and lower limits; and retaining the first long-term 
prediction lag value in the partially corrupted frame 
when the first long-term prediction lag value is within 
the upper and lower limits. 
60 14. The mobile station of claim 13, wherein the third lag 
value is determined based on the second long-term predic- 
tion lag values and an adaptively-limited random lag jitter. 

15. The mobile station of claim 13, wherein the second 
means further replaces the first long-term gain value in the 
65 partially corrupted frame with a third gain value when the 
first long-term prediction lag value is outside the upper and 
lower limits. 
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16. The mobile station of claim 15, wherein the third gain 
value is determined based on the second long-term predic- 
tion gain values and an adaptively-limited random gain 
jitter. 

17. An element in a telecommunication network, which is 
arranged to receive an encoded bit stream containing speech 
data from a mobile station, wherein the speech data includes 
a plurality of speech frames arranged in speech sequences, 
and the speech frames include at least one partially cor- . 
rupted frame preceded by one or more non-corrupted 
frames, wherein the partially corrupted frame includes a first 
long-term prediction lag value and a first long-term predic- 
tion gain value, and the non-corrupted frames include sec- 
ond long-term prediction lag values and second long-term j 
prediction gain values, and wherein a first signal is used to 
indicate the corrupted frame, said element comprising: 

a first means, responsive to the first signal, for determin- 
ing whether the first long-term prediction lag is within 
an upper limit and a lower limit, and for providing a 2 
second signal indicative of said determining; 



a second means, responsive to the second signal, for 
replacing the first long-term prediction lag value in the 
partially corrupted frame with a third lag value when 
the first long-term prediction lag value is outside the 
upper and lower limits; and retaining the first long-term 
prediction lag value in the partially corrupted frame 
when the first long-term prediction lag value is within 
the upper and lower limits. 

18. The element of claim 17, wherein the third long-term 
prediction lag value is determined based on the second 
long-term prediction lag values and an adaptively-limited 
random lag jitter. 

19. The element of claim 17, wherein the third means 
further replaces the first long-term prediction gain value 
with a third gain value when the first long-term lag value is 
outside the upper and lower limits. 

20. The element of claim 19, wherein the third gain value 
is determined based on the second long-term prediction gain 
values and an adaptively-limited random gain jitter. 
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